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(57) Abstract: The invention relates to a network element (1) and a method for enhancing the quality of digitised analogue signals 
transmitted in parameterised coded form via a digital network. In order to enable an enhancement of the quality of the digitised 
analogue signals on network side, the network element comprises means (20, 21) for extracting signals from and insert signals into 
the network, first processing means (24) for processing the extracted parameters in the parameter domain with functions suitable to 
enhance the quality of the digitised analogue signals and second processing means (26) for processing the extracted parameters in 
the linear domain with functions suitable to enhance the quality of the digitised analogue signals. Moreover included analysing and 
selecting means (23, 27) determine the expected enhancement of quality in the different processing domains and cause a correspond- 
ing insertion of processed signals back into the network. The proposed method comprises corresponding steps. 
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AUDIO SIGNAL QUALITY ENHANCEMENT IN A DIGITAL NETWORK 



FIELD OF THE INVENTION 

The invention relates to a network element and a method 
for enhancing the quality of digitised analogue signals 
transmitted in parameterised coded form via a digital 
network . 

BACKGROUND OF THE INVENTION 

Digital networks like packet based IP (Internet Protocol) 
networks or TDM (Time Division Multiplex) based networks 
are employed to transmit not only signals traffic but 
also digitised analogue signals, in particular audio 
signals like speech and video. 

Before an digitised analogue signal can be transmitted by 
the digital network, an analogue-to-digital conversion of 
the signal has to be carried out. Further, the signal is 
usually compressed, e.g. with a ratio of 8:1 or 4:1, to 
allow a low bit rate access to the core network and for 
capacity savings within the core network itself. 

When transferring voice between two IP terminals, for 
example, the speech is converted and compressed by an 
encoder in the source terminal to form parameterised 
coded digitised analogue signals and decompressed and 
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reconverted by a decoder in the destination terminal and 
vice versa. 

The quality of the speech presented to an enduser at the 
respective source terminal depends on a variety of 
factors . 

A first group of factors is network related arid comprises 
delay, lost packets etc. on the transmission route, 

A second group of factors is terminal related and 
comprises the quality of the microphone, the 
loudspeakers, the A/D converter, the automatic level 
control, the echo canceller, the noise suppressor etc. A 
further terminal related factor is the surroundings of 
the terminal, like environmental noise. Beside the 
different quality of employed speech enhancement features 
or services, some of the terminals might even lack 
completely certain speech enhancement features or 
services which would be useful to increase the 
satisfaction of the enduser. 

A third group of factors appears when several networks 
are involved in one transmission, e.g. when an IP 
terminal inter-works with the PSTN (Public Switched 
Telephone Network) or a mobile access network. In such a 
case, additional degradations may result from echo from 
PSTN hybrids or from acoustic noise from mobile terminals 
etc. IP- PSTN gateways are utilised to enable the inter- 
working between the IP network and the PSTN or the mobile 
access network. These gateways may include features for 
enhancing the quality of the speech they transmit. 
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However, some gateways are lacking important speech 
enhancement features. 

In digital networks , usually nothing is done to 
compensate for the terminal or the network transition 
specific factors on the network side. 

For GSM (Global System for Mobile communication) 
networks, the ETSI (European Telecommunication Standards 
Institution) TFO (Tandem Free Operation) specifies how 
multiple encoding and decoding, especially at gateways 
and switches, can be avoided, when complying with the TFO 
model, a transmitted TFO stream includes parameterised 
coded speech that goes end- to -end in the speech parameter 
domain. The end-points may be two mobiles or a mobile and 
an IP- terminal via a gateway. Two IP terminals 
interconnected only by an IP network involve a TFO by 
nature. The same principles are valid for the GPRS 
(General Packet Radio Service) and the third generation 
networks where the speech may stay all the way in the 
packet based network. Exemplary routes of the latter are: 
MS-BS-RNC-SGSN-GGSN-IP terminal or MS-BS-PCU-SGSN-GGSN-IP 
terminal (MS: Mobile Station; BS: Base Station; RNC: 
Radio Network Controller; SGSN: Serving GPRS Support 
Node; GGSN: Gateway GPRS Support Node; PCU: Packet 
Control Unit) . However, until end-to-end TFO connections 
are realised in all networks, the transition factors 
influencing the quality of transmitted digitised analogue 
signals still have to be considered and the terminal 
specific factors are not affected by the TFO approach 
anyhow . 
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In the whole, it would be beneficial if digital networks 
provided means for enhancing the quality of digitised 
analogue signals. Multiple encoding and decoding, 
however, should be avoided for quality reasons. 

For packet based networks, ITU-T specification H.323 
(07/2000) introduces a multipoint processor (MP) used for 
conference calls. The multipoint processor prepares N- 
audio outputs from M-audio inputs by switching and/or 
mixing. For mixing, the input audio signals are decoded 
to linear signals on which a linear combination is 
performed. The resulting signal is encoded again to the 
appropriate audio format. It is proposed that the 
multipoint processor moreover eliminates or attenuates 
some of the input signals in order to reduce noise and 
other unwanted signals. 

This means, however, that an additional decoding and 
encoding step is introduced as well, which should be 
avoided for the sake of the quality of the audio signal 
as mentioned above and of a small processing delay. 

SUMMARY OF THE INVENTION 

It is an object of the invention to provide a network 
element and a method that allow for a satisfactory 
enhancement of the quality of digitised analogue signals 
transmitted via a digital network on the network side. 

On the one hand, this object is reached by a network 
element for enhancing the quality of digitised analogue 
signals transmitted at least in parameterised coded form 
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via a digital network to which the network element ha^ 
access, comprising: a payload extraction block for 
extracting coded digitised analogue signals from the 
digital network, which coded digitised analogue signals 
include at least in part parameterised coded digitised 
analogue signals; first processing means for processing 
the extracted parameterised coded digitised analogue 
signals in the parameter domain with functions suitable 
to enhance the quality of the digitised analogue signals; 
second processing means for processing at least part of 
the extracted coded digitised analogue signals in the 
linear domain with functions suitable to enhance the 
quality of the digitised analogue signals; a payload 
insertion block for inserting processed coded digitised 
analogue signals to the digital network; and analysing 
and selecting means for determining the quality 
improvement of the digitised analogue signals resulting 
from a processing of the extracted coded digitised 
analogue signals in the parameter domain and from a 
processing of the extracted coded digitised analogue 
signals in the linear domain and for causing that at 
least coded digitised analogue signals processed by the 
processing means leading to the better improvement are 
inserted back to the digital network by the payload 
insertion block. 

On the other hand, the object is reached by a method for 
enhancing the quality of digitised analogue signals 
transmitted at least in parameterised coded form via a 
digital network, comprising: 

extracting coded digitised analogue signals from the 
digital network, which coded digitised analogue 
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signals include at least in part parameterised coded 
digitised analogue signals; 

- determining the quality improvement of the digitised 
analogue signals to be expected by a processing of the 
extracted encoded digitised analogue signals in the 
parameter domain and by a processing of the extracted 
encoded digitised analogue signals in the linear 
domain; 

- processing the extracted parameterised coded digitised 
analogue signals in the parameter domain at least if a 
greater quality improvement is expected by processing 
in the parameter domain, with functions suitable for 
enhancing the quality of digitised analogue signals; 
and 

processing at least part of the extracted coded 
digitised analogue signals in the linear domain at 
least if a greater quality improvement is expected by 
processing in the linear domain, with functions 
suitable for enhancing the quality of digitised 
analogue signals; and 

inserting at least those processed coded digitised 
analogue signals to the digital network that were 
processed in the domain, the processing in which was 
expected to result in a greater quality improvement. 

By including a possibility for processing transmitted 
coded digitised analogue signals not only in the linear 
domain but also in the parameter domain, the network 
element and the method according to the invention enable 
an optimal enhancement of the quality of digitised 
analogue signals on the network side. 
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The analysing and selecting means of the network of the 
invention determine, whether linear and/or parameter 
domain processing should be used by analysing whether 
linear or parameter domain processing results in a better 
quality improvement of the digitised analogue signals. A 
corresponding step is provided in the method of the 
invention. For example, if parameter domain processing is 
not technically feasible for the enhancement of the 
signal quality, linear processing is expected to result 
in a better quality enhancement. If the processing in the 
parameter domain is possible, the expected quality 
enhancement is determined for both kinds of processing 
and the selection is based on a comparison of the 
expected enhancements . 

In case that a processing of extracted signals in the 
parameter domain is expected to lead to a better 
enhancement of the quality of the digitised analogue 
signal, at least signals processed in the parameter 
domain are inserted to the network again. In case that a 
processing of extracted signals in the linear domain is 
expected to lead to a better enhancement of the quality 
of the digitised analogue signal, only signals processed' 
in the linear domain are inserted to the network again. 

In the case that the processing in the parameter domain 
is expected to lead to better results, signals processed 
in the linear domain should only be inserted to the 
network in addition to signals processed in the parameter 
domain, if the processing in the linear domain leads to a 
larger processing delay because of necessary time 
consuming pre- and af tertreatments . This way, it is 
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possible to dispensed with the disadvantageous additional 
decoding and encoding of the extracted signals necessary 
before processing parameterised coded digitised analogue 
signals in the linear domain. No additional decoding and 
encoding of the signals means a better quality of the 
digitised analogue signals and at the same time less 
processing delay. For example, parameterised coded 
digitised analogue signals transmitted via packet based 
networks , as well as coded digitised analogue signals 
transmitted in the TFO stream in a TDM based network 
require decoding before and encoding after processing in 
the linear domain, while coded digitised analogue signals 
transmitted in the PCM stream in a TDM based network 
require only a-law or ^-law to linear conversions and 
vice versa for linear processing. 

While the signals to be inserted to the network again are 
selected according to the expected quality improvement, a 
processing in both domains can be carried out in any 
case, if the processed signals are to be evaluated for 
determining which processing is expected to lead to a 
better result. In case that only signals processed in the 
parameter domain are to be inserted to the network again, 
this insertion can be carried out before the processing 
in the linear domain is completed. The signals processed 
in the linear domain are then used as soon as they are 
ready for determining the future expected quality 
improvements by linear processing. 

Preferred embodiments of the invention become apparent 
from the subclaims. 



WO 02/054744 



- 9 - 



PCT/EPOO/13320 



The analysing and selecting means of the network of the 
invention can base its decision whether a processing in 
the parameter domain or in the linear domain is to be 
carried out on an analysis of incoming parameter domain 
data, like parameters for gains. Alternatively or 
additionally, it can base the decision on measurements, 
like voice level, signal-to-noise ration and presence of 
echo, carried out in the linear domain after decoding. 
Preferably, the measurements and the selection are made 
before and after the input data is processed in the 
linear and in the parameter domain. The selection of the 
processing domain can then be made by comparing the 
measurements to fixed thresholds that suggest either the 
linear or parameter domain processing. The numerical 
values for the thresholds can be derived by performing 
e.g. real listening tests with varying test input data 
that is processed and assessed in both domains. 

As several factors affect the choice of the processing 
domain, it may be difficult to formulate threshold 
patterns that result in the best choices in all call 
conditions. Therefore, in a further preferred embodiment, 
a neural network based approach is used for selecting the 
processing domain that is expected to bring the better 
results. Incoming parameter domain data and results from 
measurements after decoding can be used as the input for 
the neural network of N neurons. Weights or coefficients 
for the neurons can be derived by training the network 
with appropriate test data and outputs from real 
listening tests. 
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The processing means for processing in the parameter 
domain and the processing means for processing in the 
linear domain may include a variety of functions. Echo 
cancellation, noise reduction and level control are 
possible functions for both, processing in the parameter 
and in the linear domain. In addition, transcoding and 
speech mixing as conference bridge are at least possible 
functions for processing in the parameter domain. 

For example, for a gain control in the parameter domain, 
the gain parameters of the extracted parameterised coded 
digitised analogue signals can be compared with a desired 
gain for forming corresponding new gain parameters. The 
desired gain parameters can be pre-set, input by the user 
or calculated out of the received gain parameters. The 
new gain parameters are then inserted into the extracted 
parameterised coded digitised analogue signals, thus 
substituting the original gain parameters . 

In order to achieve a noise suppression by processing in 
the parameter domain, a processing in the time domain or 
in the frequency domain, preferably in both, is carried 
out. In the time domain, noise portions and low level 
signal portions of the extracted parameterised coded 
digitised analogue signals are attenuated and 
corresponding gain parameters are inserted in the 
extracted parameterised coded digitised analogue signals, 
thus replacing the original gain parameters. In the 
frequency domain, frequency portions of noise in the 
extracted parameterised coded digitised analogue signals 
which have approximately the same energy as the noise 
estimate are attenuated. Corresponding linear prediction 
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parameters are then inserted to the extracted 
parameterised coded digitised analogue signals, thus 
replacing the original linear prediction parameters. 

For echo suppression in the parameter domain, 
parameterised coded digitised analogue signals are 
extracted from both transmission directions. The signals 
can then be compared in order to detect echoes in the 
first parameterised coded digitised analogue signals. 
Portions of the first parameterised coded digitised 
analogue signal are replaced by comfort noise portions, 
if an echo was determined in the portion of the first 
parameterised coded digitised analogue signal. The echo 
signal can also first be attenuated and then, the 
residual echo signal is suppressed. It is proposed to 
include a possibility for by-passing the first 
parameterised coded digitised analogue signals without 
echo compensation, if there is no signal activity in the 
opposite direction or if the signal level of the 
extracted parameterised coded digitised analogue signals 
is below a threshold level in the opposite direction. 

In a preferred embodiment of the invention, a bad frame 
handler block is included in the network element. This 
block may work together with the payload extraction block 
and the processing means for detecting missing frames, 
e.g. from RTP (Real Time Protocol) numbers, for 
regenerating missing blocks, e.g. by using interpolation 
techniques or copying previous frames, and for reordering 
frames in disorder within a buffering window. A suitable 
location for the bad frame handler block is immediately 
after the payload extraction block. 
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In a further preferred embodiment of the invention, the 
network element comprises analysing means for determining 
whether any processing is to be applied to the extracted 
parameterised coded digitised analogue signals and for 
selecting the functions that are to be applied to 
extracted coded digitised analogue signals in the 
parameter domain and/or the linear domain. Those 
functions can be included in the analysing and selecting 
means used for determining the quality, improvement 
expected by a processing in the parameter domain and by a 
processing in the linear domain. 

In case no processing is deemed to be necessary, the 
coded digitised analogue signals can simply pass one or 
both of the processing means without any processing being 
carried out . 

The choice can be taken by the analysing means 
autonomously by analysing the received coded digitised 
analogue signals and possibly by analysing already 
processed signals. Alternatively or additionally, the 
choice may depend on an external control signal. Even if 
an external control signal is employed and does not ask 
for any processing to be carried out, the analysing means 
can evaluate the quality of the received parameterised 
coded digitised analogue signals, e.g. with regard to 
speech level, existence of echo, signal-to-noise ratio, 
and select one or several processing functions . The 
external control signal can enter the network element via 
a control block in the network element, which may be 
conform to the specified H.248 protocol, and indicates 
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for example that there is already an echo canceller on 
the connection and that therefore the received 
parameterised coded digitised analogue signals can be 
forwarded without echo cancellation by the processing 
means. The control block can also have a direct access to 
the processing means for selecting the processing 
functions that are to be carried out by itself. 

Selection of the most suitable functions to be employed 
is also a preferred feature of the method according to 
the invention. 

The digital network involved may be either packet based, 
like IP-, UDP- {User Datagram Protocol) or RTP- (Real 
Time Protocol) networks, or TDM based. Still, any other 
digital network transmitting parameterised coded 
digitised analogue signals can be accessed as well. When 
referring in this specification to an IP network, this 
includes any IP-, UDP- or RTP- network. 

In a packet based network, the digitised analogue signals 
are only transmitted as parameterised coded digitised 
analogue signals. In a TDM based networks, employed e.g. 
for GSM, the digitised analogue signals can be 
transmitted as parameterised coded digitised analogue 
signals in a TFO stream and simultaneously in a PCM 
(Pulse Code Modulation) stream as a-law or |i-law coded 
G.711 PCM samples. 

Accordingly, in one preferred alternative, the payload 
extraction block is suitable to extract parameterised 
encoded digitised analogue signals from an IP stack of a 
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packet -based network and the payload insertion block is 
suitable to insert parameterised encoded digitised 
analogue signals to said IP stack of the packet-based 
network. 

In another preferred alternative, the payload extraction 
block is suitable to extract a TFO stream and, if 
desired, in addition a PCM stream from the timeslots of a 
TDM based network. In' the latter case, the two streams 
are separated in the payload extraction box for further 
processing, and the payload insertion block is suitable 
to combine a supplied TFO stream with a supplied PCM 
stream again and to insert the combined stream to said 
TDM based network. If the payload insertion means is only 
provided with a PCM stream, however, it can also insert 
only this PCM stream back to said TDM based network 
again. 

In GSM-PCM, the payload extraction block can take only 
the TFO stream as input or alternatively the TFO stream 
and the PCM stream, which are then separated in the 
payload extraction block. 

An extracted TFO stream that is inserted to the digital 
network again has either been processed in the parameter 
domain or in the linear domain with a decoding before and 
an encoding after the linear processing. Which kind of 
TFO stream is inserted should depend on the achieved or 
achievable quality improvement of the included digitised 
analogue signal. In addition, the TFO stream processed 
after decoding in the linear domain should be transformed 
without prior encoding into a PCM stream that is combined 
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with the selected encoded TFO streams for insertion into 
the digital network. However, in case no TFO stream is 
available at the payload extraction means or in case the 
TFO stream is stopped, the PCM stream can be extracted 
and processed in the linear domain and output to the 
digital network via the payload insertion means by 
itself. 

Alternatively, the TFO stream can be processed in the 
parameter domain and the PCM stream, which does not have 
to be decoded for linear processing, can be processed in 
parallel in the linear domain. In case the TFO stream is 
only processed if it is expected to lead to a better 
result than the processing of the PCM stream, the TFO 
stream is not necessarily included in the data inserted 
to the network again when not processed. 

The network element according to the invention can be 
located freely beside or inside any other network 
element. In a packet based network, the network element 
of the invention is preferably co- located with a 
broadband IP node, which leads to minimal processing 
delays . 

The network element and the method of the invention can 
be used for the enhancement of the quality of any 
digitised analogue signals transmitted by a digital 
network in parameterised coded form. It is of particular 
relevance for transmitted speech, but also e.g. for 
video. 



BRIEF DESCRIPTION OF THE FIGURES 
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In the following, the invention is explained in more 
detail with reference to drawings, of which 

Fig, 1 shows the integration of the network element 

according to the invention in an IP-network; 
Fig. 2 shows a first embodiment of the network element 

according to the invention; 
Fig. 3 shows a second embodiment of the network element 

according to the invention; 
Fig. 4 shows a third embodiment of the network element 

according to the invention; 
Fig. 5 shows a block diagram of an embodiment of a 

parameter domain gain control; 
Fig. 6 shows a block diagram of an embodiment of a 

parameter domain noise suppression; 
Fig. 7 shows a block diagram of an embodiment of a 

parameter domain echo suppression; and 
Fig. 8 shows a block diagram of an embodiment of a 

parameter domain echo cancellation. 

DETAILED DESCRIPTION OF THE INVENTION 

Figure 1 shows the environment of a network element 1 
according to the invention. 

A fist terminal 2 is connected via an IP network with a 
second terminal 3. Both terminals 2, 3 can be IP phones. 
At some place in the IP network, there is an IP router 
forming a broadband IP node 4. Co-located with and 
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connected to this network node 4, there is a network 
element 1 according to the invention. 

Network element 1 operates in the speech parameter domain 
and is able to perform signal processing functions for 
parameterised coded speech. The available functions are 
echo cancellation, noise reduction, gain control, 
conference bridge and bad frame handling. Possibilities 
for realising some of those functions will be described 
later with reference to figures 5 to 8. 

Parameterised coded speech passes from the first terminal 
2 to the network node 4 . They are forwarded from the 
network node 4 to the network element 1, which carries 
out the appropriate functions in the speech parameter 
domain. Then, the processed parameterised coded speech is 
sent back to the network node 4 which forwards them to 
their destination, the second terminal 3. 

Figure 2 shows the different elements comprised in an 
embodiment of the network element 1 of figure 1. 

A payload extraction block 20 and a payload insertion 
block 21 form together the interface of the network 
element 1 to the network node 4 . Within the network 
element 1, the payload extraction block 20 is connected 
via a bad frame handler block 22 to an analyser and 
selector block 23 . The two outputs of the analyser and 
selector block 23 are connected on the one hand to first 
processing means 24 and on the other hand via a speech 
decoding block 25 to second processing means 26. Each of 
the processing means 24, 26 comprises a function for echo 
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cancellation, for noise reduction and for level control. 
The output of the first processing means 24 is connected 
to the input of a selector 27. The output of the second 
processing means 26 is equally connected to the input of 
the selector 27, but via a speech encoding block 28. The 
output of the selector 27 is input to the payload 
insertion block 21. Finally, there is a control block 29 , 
e.g. an H. 248 protocol control block, which receives as 
input a control signal generated externally of the 
network element 1 and the output of which is connected to 
the analyser and selector block 23 . 

The network element 1 functions as follows: 

The payload extraction block 20 extracts the payload/ • 
i.e. parameterised coded speech, from the IP stack of the 
network node 4 of figure 1. The speech parameters are 
checked by the bad frame handler block 22. Here, missing 
frames are detected and regenerated by using 
interpolation techniques. Moreover, frames in disorder 
are reordered within a buffering window. The processed 
signals are then forwarded to the analyser and selector 
block 23 . 

The analyser and selector block 23 analyses the speech 
parameters and determines whether a processing in the 
linear domain or in the parameter domain would lead to a 
better result and which of the available functions should 
be applied. If parameter domain processing is not 
technically feasible for the speech enhancement, linear 
processing is selected. The analyser and selector block 
23 can also determine that no processing at all needs to 



WO 02/054744 



PCT7EP0O/1332O 



- 19 - 

be carried out! The analyser and selector block 23 
receives in addition external information via the control 
block 29, indicating for example whether there is already 
an echo canceller on the connection so that a further 
echo cancellation is not necessary. 

If rio processing or a processing in the parameter domain 
was selected, the analyser and selector block 23 outputs 
the encoded speech to the first processing means 24 , 
which applies all selected functions to the parameterised 
coded speech in the parameter domain. 

If a processing in the linear domain was supposed to be 
necessary, the analyser and selector block 23 outputs the 
parameterised coded speech to the speech decoding block 
25. The speech decoding block 25 decodes the coded 
speech, which may be suitable for GSM FR (Full Rate) , to 
form a linear signal. The linear speech signal is then 
input to the second processing means 26, which applies 
all selected functions to the linear speech signal in the 
linear domain. After processing, the linear speech signal 
is input to the speech encoding block 28, which encodes 
the linear speech signal to form parameterised coded 
speech suitable for GSM FR again. 

The selector 27 receives the output signals of the speech 
encoding block 28 and of the first processing means 24 
and is moreover controlled by the analyser and selector 
block 23. Therefore, the selector 27 is able to 
determine, whether the signals from the first processing 
means 24 or the signals from the speech encoding block 28 
constitute processed coded speech and to forward the 
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respective signals to the payload insertion block 21. The 
selector 27 can moreover support the work of the analyser 
and selector block 23 by providing information about 
processed signals. 

In the payload insertion block, the parameterised coded 
speech is inserted back as payload to the IP stack of the 
network node 4, from where it is forwarded to its 
destination 3. 

In the whole, an enhancement of the quality of speech can 
be achieved, while additional decoding and encoding is 
only carried out if necessary. A superfluous decrease in 
the speech quality is therefore avoided and the 
processing delay is kept low by the processing in the 
parameter domain. Since the network element 1 is co- 
located with the broadband IP node 4, processing delays 
are further minimised. 

Figure 3 schematically illustrates another embodiment of 
the network element of the invention. The embodiment is 
similar to the first embodiment of the network element, 
but it is employed for processing of encoded speed 
parameters received from a network node in a TDM based 
network, which is used for GSM TFO. 

Equal to the network element of figure 2, the network 
element of figure 3 comprises a payload extraction block 
30, a bad frame handler 32, an analyser and selector 
block 33, a decoding block 35, first and second 
processing means 34, 36, an encoding block 38, a payload 
insertion block 31 and a H.248 control block 39. Both 
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processing means 34, 36 comprise again functions for echo 
cancellation, noise reduction and level control. The 
elements are connected to each other in the same way as 
in figure 2. In contrast to the network element of figure 
2, however, instead of a selector block 27, a second 
analyser and selector block 37 is integrated between the 
encoding block 38 and the payload insertion block 31. 
Moreover, the output of the second processing means 36 is 
not only connected to the encoding block 38, but also 
directly to the payload insertion block 31. 

The network element of the second embodiment functions as 
follows : 

The signal entering the payload extraction block 30 from 
a network node contains a G.711 PCM stream of 48 or 56 
kbps in the most significant bits and GSM TFO encoded 
speech parameters at 16 or 8 kbps in the least 
significant bits. In the payload extracting block 30, the 
TFO stream is separated from the PCM stream. Only the TFO 
stream is forwarded to the bad frame handler block 32, 
where it is treated as described for the treatment of the 
parameterised coded speech in the embodiment of figure 2. 

After the bad frame handling, the TFO stream is inputted 
to the analyser and selector block 33. The analyser and 
selector block 33 forwards the TFO stream on the one hand 
to the first processing means 34, where the stream is 
processed in the parameter domain. On the other hand, the 
analyser and selector block 33 forwards the TFO stream to 
the decoding means 35, where a speech decoding, e.g. 
again a GMS FR to linear decoding, is carried out. The 
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decoded TFO stream is then inputted to the second 
processing means 36, where it is processed in the linear 
domain. For both processing means 34, 36, the functions 
to be applied are chosen in the first analyser and 
selector means 33 according to an external control signal 
entering the network element via the control block 39. 

The output of the first processing means 34 fed to the 
analyser and selector block 37. The output of the second 
processing means 36 is speech encoded again in the 
encoding means, e.g. linear to GSM FR encoding, and fed 
to the second analyser and selector block 37 as well. 

The first analyser and selector block 33 and the second 
analyser and selector block 37 work together for 
determining which processing, the one in the parameter 
domain or the one in the linear domain, results in a 
better voice quality. 

In case that parameter processing of the TFO stream is 
determined to result in a better voice quality than 
linear processing of the decoded TFO stream, only the TFO 
stream coming from the first processing means 34 is 
forwarded by the second analyser and selector block 37 to 
the payload inserting means 31. In case that linear 
processing of the decoded TFO stream is determined to 
result in a better voice quality than parameter 
processing of the TFO stream, only the TFO stream coming 
from encoding block 38 is forwarded by the second 
analyser and selector block 37 to the payload inserting 
means 31. 
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Both paths can be working all the time so that a change 
between the different modes, pure linear processing and 
parallel processing, can be carried out without 
discontinuities in the internal states of the decoding 
means 25 and the encoding means 28. 

The output of the second processing means 36 is forwarded 
in addition without any encoding directly to the payload 
insertion means 31. In the payload insertion means 31, a 
PCM stream is formed out of the decoded and linearly 
processed TFO stream. The PCM stream and the selected 
coded TFO stream are then combined and inserted back into 
the TDM based network for further transmission. 

Thus, the speech quality of the digitised analogue signal 
in the output" PCM stream is improved by linear processing 
and the speech quality of the digitised analogue signal 
in the output TFO stream is improved by processing in the 
parameter domain or in the linear domain, depending on 
which processing leads to a better result. 

If there is no TFO stream available in the signal 
extracted by the payload extracting means 30, or if the 
TFO stream is stopped, a possibility is provided for 
conducting the PCM stream through the bad frame handler 
32 for frame related treatment and through the second 
processing means 36 for processing in the linear domain. 
The passing of a decoding block is not necessary, since 
the PCM stream does not contain parameterised data. It 
should be noted, though, that linear processing of a 
G.711 PCM stream requires a-law or ji-law to linear 
conversions and vice versa. The processed PCM stream is 
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then inserted to the digital network again by the payload 
insertion means 31. 

Figure 4 schematically illustrates a third embodiment of 
the network element of the invention constituting a 
second option for enhancing the quality of speech in a 
TDM based network used for GSM TFO. 

In this example, a payload extracting block 40 is 
connected via a bad frame handler block 42 directly to 
first and second processing means 44, 46. Both processing 
means 44 , 46 comprise again functions for echo 
cancellation, noise reduction and level control. Also the 
outputs of the first and the second processing means 44, 
46 are connected only directly to inputs of the payload 
insertion block 41. A H.248 protocol control block 49 is 
present again. 

The network element of the third embodiment functions as 
follows: 

The PCM stream and the TFO stream entering the payload 
extraction block 40 from a network node are separated by 
the payload extraction block 40 as in the embodiment of 
figure 3. In this embodiment, however, both, the TFO 
stream an the PCM stream, are forwarded to the bad frame 
handler block 42 and treated there as explained with 
reference to figures 2. 

After the bad frame handling, the TFO stream is forwarded 
to the first processing means 44, where it is processed 
in the parameter domain. At the same time, the PCM 
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samples are forwarded to the second processing means 46. 
Since in this embodiment, only the PCM samples are 
processed by the processing means 46 working in the 
linear domain , a decoding block is not necessary; as 
mentioned with regard to the embodiment of figure 3 the 
PCM stream does not contain parameterised data. In both 
processing means 44, 46, the functions to be applied are 
chosen according to an external control signal by means 
of the control block 49 of the network element. 

Thus, speech enhancement is carried out for both, the TFO 
stream and the PCM stream separately at the same time. In 
any case, the coded speech in the TFO stream is not 
decoded for processing and encoded again. 

The TFO stream and the PCM stream leaving the processing 
means 44, 46 are combined in the payload insertion block 
41 and inserted back into the TDM based network for 
further transmission. It can be decided at some other 
place of the network which one of the streams should be 
used for obtaining the best voice quality. 

Each of the three described embodiments of the network 
element according to the invention allows for an 
enhancement of the quality of parameterised speech or 
video on the network side with minimal processing delay. 
They can be located freely beside or inside any existing 
network element. 

Now, different possibilities of processing in the 
parameter domain in the first processing means 24,34,44 
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of one of figures 2 to 4 will be described with reference 
to figures 5 to 8. 

Figure 5 shows a block diagram of a gain control device 
that can be integrated in a first processing means of a 
network element according to the invention for gain 
control in the parameter domain. An input line is 
connected on the one hand to the input of a decoder 50 
and on the other hand to a first input of a gain 
parameter re -quantisation block 53. The decoder 50 is 
further connected directly and via a speech level 
estimation block 51 to a linear-to-parameter domain 
mapping block 52. The output of the linear-to-parameter 
domain mapping block 52 is connected to a second input of 
the gain parameter re-quantisation block 53 which is 
connected in addition to an output line. 

Incoming coded speech frames are forwarded to the decoder 
50, where the coded speech is linearised before being fed 
to the speech level estimation block 51. The speech level 
estimation block 51 comprises an internal voice activity 
detector (VAD) used for indicating whether the level 
estimate has to be updated, since it is desirable that in 
the speech level estimate only the speech level is 
estimated. 

In the speech level estimation block 51 , a desired gain 
value is calculated based on an estimated speech level 
and a predetermined desired target speech level . The 
desired gain is fed to the first input for the linear-to- 
parameter domain mapping block 52. 
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The speech estimation block 51 is only needed for an 
automatic level control. In case a fixed gain control is 
to be used, possibly with a user settable gain, the 
decoder 50 and the speech estimation block 51 can be 
omitted. 

Further fed to the linear-to-parameter domain mapping 
block 52 are decoded gain parameters of current speech 
frames of e.g. 20 ms or of sub- frames of e.g. 5 ms, which 
decoded gain parameters are coming directly from the 
decoder 50. The decoded gain parameters are typically 
excitation gain parameters of a code excited linear 
prediction (CELP) speech coder. These gain parameters 
typically consist of adaptive and fixed codebook gains, 
which are vector quantised for the transmission. Scalar 
values of these parameters can be obtained from internal 
intermediate values of the decoder 50. 

In the linear-to-parameter domain mapping block 52 , the 
linear desired gain value is converted to appropriate new 
gain parameters of a speech coder. A codebook based 
mapping is used for determining these new gain parameters 
for the current frame or sub- frame in order to achieve 
the desired gain. The codebook is a three-dimensional 
table in which adaptive codebook gain, fixed codebook 
gain and linear gain values form each dimension. The new 
gain parameter values are read from the table as soon as 
all input values for the frame or sub- frame are known. 
This table is trained beforehand in a way that the errors 
between the new gain parameter values and the gain 
parameter values of gain scaled coded frames for each 
desired linear gain value are minimised. Alternatively, 
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the mapping table could be trained by minimising the 
error between the decoded re-quantised speech frame and a 
decoded gain scaled speech frame. The training requires 
several test sequences in order fully train all elements 
within the mapping table. 

In practical implementations it might be useful to 
compress the size of the table either by utilising 
redundancy in the data, by limiting linear gain values or 
by increasing the step size of input values. Another 
choice is to find out a mathematical function, which 
approximates the mapping function in such way that the 
performance is subjectively acceptable. 

Finally, the new gain values are re-quantized for the 
transmission and the original gain values are replaced 
with the new values in the gain parameter re-quantization 
block 53. 

Figure 6 shows a block diagram of a noise suppression 
device that may be integrated in a first processing means 
of a network element according to the invention for noise 
suppression in the parameter domain. 

An input line is again connected on the one hand to the 
input of a decoder 60 and on the other hand to a first 
input of a gain parameter re -quantisation block 63. A 
first output of the decoder 60 is connected via a speech 
level estimation block 61, a VAD 66, a noise level and 
spectrum estimation block 64 and a short term signal 
level and spectrum calculation block 65 to a block 67 for 
determining noise attenuation parameters. The output of 
the VAD 66 is moreover connected to an input of the 
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speech level estimation block 61 as well as to an input 
of the noise level arid spectrum estimation block 64 . 

A first. output of the block 67 for determining noise 
attenuation parameters is connected to a first input of a 
spectrum-to-LP (linear prediction) mapping block 68 and a 
second output to a first input of a linear- to-parameter 
domain mapping block 62 . 

A second output of the decoder 60 is connected to a 
further input of the noise level and spectrum estimation 
block 64 and of the short term signal level and spectrum 
calculation block 65 and additionally to a second input 
of the spectrum to LP mapping block 68. A third output of 
the decoder 60 is connected to a second input of the 
linear-to-parameter domain mapping block 62. 

The output of the linear-to-parameter domain mapping 
block 62 is connected to a second input of the gain 
parameter re-quantisation block 63, the output of which 
is in turn connected to a first input of a LP parameter 
re-quantisation block 69. The second input of this block 
69 is connected to the output of the spectrum-to-LP 
mapping block 68. 

Finally, the output of the LP parameter re-quantisation 
block 69 is connected to an output line. 

The decoder 60, the speech level estimation block 61, the 
linear-to-parameter domain gain mapping block 62 and the 
gain parameter re-quantisation block 63 can be identical 
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or quite similar to the corresponding blocks 50-53 of the 
example of figure 5. 

In the example of figure 6, noise suppression can be 
achieved by time-domain or frequency-domain parameter 
processing. Obviously by combining both methods, the 
optimum performance can be obtained. 

The time-domain processing is based on a dynamic 
processing in which noise portions and very low level 
speech portions are slightly attenuated by a gain control 
function making use of the blocks 60-63 corresponding to 
the blocks 50-53 of figure 5. The gain control is 
therefore carried out as explained above, only that block 
67 is used for forwarding the speech level estimate 
received by block 61 to the linear-to-parameter domain 
mapping block 62. This can be understood as an expanding 
function in parameter domain. 

In the frequency-domain noise suppression, the frequency 
portions, which have more energy than speech, are 
attenuated. Traditionally, a linear time-domain signal is 
first converted to the frequency-domain by utilising 
Fourier Transform or filter banks. Then, a spectral 
subtraction can be applied to the frequency -domain 
signal. The amount of subtraction is based on a noise 
estimate, signal-to-noise ratio and possible other 
parameters. Finally, the noise attenuated signal is 
converted back to the time-domain. In this example, 
however, the frequency- domain processing is carried out 
by re -shaping a Linear Prediction (LP) spectrum envelope 



WO 02/054744 



- 31 - 



PCT/EPOO/13320 



of speech frames . This is explained more in detail in the 
following. 

To achieve a high quality noise suppression, an accurate 
noise estimate has to be modelled. In order to 
differentiate between speech and speech pauses, a voice 
activity detector 66 is employed, which outputs a speech 
flag "true", when speech was detected and a speech flag 
"false" when a speech pause is detected. The voice 
activity detector 66 needs to be of high quality in order 
to get accurate VAD decisions even in low signal-to-rioise 
ratio conditions, otherwise speech and noise estimates 
will diverge. Basically, the speech level estimate is 
updated in the speech level estimation block 61 when the 
speech flag is true, and noise level and spectrum 
estimates are updated in the noise level and spectrum 
estimation block 64 when the speech flag is false. 

In block 64, the long term noise level and spectrum are 
estimated. For the long term noise spectrum estimate, 
Linear Prediction Coefficients (LPC) need to be decoded 
in the decoder 60 from the received speech frame. The LP 
coefficients are often converted to Line Spectral Pairs 
(LSP) by the encoder employed for encoding. In that case, 
the LPC values can be obtained from internal intermediate 
values of the decoder 60. As the LP coefficients define 
only the spectral envelope, the noise level estimate is 
required to scale the LP spectral envelope, in order to 
form a power spectrum estimate of the noise. 
Alternatively, the LP spectral envelope could be scaled 
by using excitation gain parameters of the received 
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frame. As already mentioned above, the noise estimate is 
updated only if the VAD flag is false. 

A short-term signal level and spectrum is calculated for 
the received frame in the same manner as previously 
described in the short term signal level and spectrum 
calculation block 65, except that no averaging or a fast 
averaging of previous frames is used for the level 
calculation. Typically, VAD decisions are not utilised. 

The main intelligence of the algorithm lies in the block 
67 for determining noise attenuation parameters. In this 
block 67, frequency-domain noise attenuation parameters 
(i.e. desired spectrum shaping) are selected according to 
the long term noise spectrum estimate received by block 
64 and the received short term signal spectrum received 
by block 65. Accordingly, the desired time-domain gain is 
based on the long term speech and noise, and short term 
signal levels. Moreover, VAD information received by the 
VAD 66 and long term signal-to-noise ratio calculated 
from speech and noise level estimates received from 
blocks 61 and 64 are utilised as extra information for 
the algorithm of the block 67 for determining noise 
attenuation parameters. 

In the spectrum shaping in block 67, the long term noise 
spectrum estimate is compared with the short term signal 
spectrum. A target frame spectrum is shaped in such a way 
that those short term spectrum parts, which are quite 
close to long term spectrum, are slightly attenuated. On 
the other hand those parts, which are clearly above of 
long term spectrum, are left untouched because those 
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parts likely contain speech information. Additionally, 
the frequency and temporal masking of human auditory 
system can be utilised in frequency shaping. This means 
that if some parts of the spectrum lie within an auditory 
frequency masking curve, no frequency shaping is required 
for those parts. In temporal masking no frequency shaping 
(or time -domain processing) is needed for the current 
frame if one or more previous frames has contained higher 
speech level which introduce temporal masking effect for 
lower level signals of the current frame. Using these 
rules results in less distortion to processed speech as 
less shaping is done. 

Furthermore, the spectrum shaping can be controlled by 
the VAD flag in such way that less shaping is applied if 
a speech pause was detected. The noise attenuation is 
then mainly achieved by gain processing during speech 
pauses by blocks 60-63. In addition, also the short term 
signal level can control the amount of shaping. Namely, 
there is less shaping with low level frames as the noise 
attenuation is partly handled with gain processing. 
Finally, the amount of spectrum shaping can depend on the 
long term signal-to-noise ratio (SNR) in such way that 
less shaping is applied in high SNR in order to preserve 
high quality in noiseless speech conditions. 

As soon as the desired spectrum shaping is calculated for 
the current frame, original LP coefficients have to be 
converted according to the desired spectrum. This is 
carried out in the spectrum-to-LP mapping block 68. The 
mapping can be realised again as codebook mapping by 
using the original LPC and the desired spectrum as input 
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parameters. Alternatively, new LP coefficients could 
directly be calculated from the desired spectrum by 
converting the spectrum to an LP spectrum envelope and 
thereby converting it to LP coefficients. 

Finally, in the LP parameter re-quantisation block 69, 
the new LPC parameters are quantised or converted to LSP 
parameters and the old parameters are replaced with new 
ones in the coded frames. 

As mentioned previously, a signal dynamics expanding 
function can be used together with the spectrum shaping 
or it can be even used alone. If it used alone, only a 
slight expansion is allowed as it might cause a noise 
modulation effect. Basically in expansion, the lower the 
signal level is, the more attenuation is applied. Th6 
expansion threshold is controlled by the noise level 
estimate in such a way that the frame or sub- frame 
exceeding the noise level estimate is not attenuated. 
Furthermore, the VAD 66 can control the expansion in such 
a way that slightly less expansion is utilised whenever 
the current frame is a speech frame. Thereby the 
attenuation of low level speech phonemes can be 
minimised. 

As soon as the desired linear gain for the current frame 
or sub- frame is found, the linear- to-parameter domain 
mapping and gain parameter re-quantisation can be carried 
out in blocks 62 and 63 as described with reference to 
the gain control. As a result, modified gain and LPC 
parameters are transmitted with other speech parameters 
over the transmission media. 
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Figure 7 shows a block diagram of an echo suppression 
device that can be integrated in a first processing 
device of a network element according to the invention 
for echo suppression in the parameter domain. 

A first input line is connected to a first decoder 70 and 
a second input line is connected to a second decoder 71/ 
both decoders 70, 71 being connected^ in turn to an echo 
analysis block 72. The output of the first decoder 70 is 
further connected via a noise estimation block 73, a 
comfort noise generation block 74 and an encoder 75 to 
one connection of a switch 76. The switch 76 can either 
form a connection between the encoder 75 and an output 
line or between the first input line and the output line. 
The echo analysis block 72 has a controlling access to 
this switch 76 . 

In order to be able to determine if a signal transmitted 
from a near end to a far end comprises an echo and to be 
able to suppress or cancel such an echo, signals from 
both transmission directions have to be analysed. 
Therefore, two decoders 70, 71 are employed for 
linearising signals from the near-end (point where echo 
is reflected back) as "send in" signals and from the far- 
end as "receive in" signals respectively. It is easier 
and. more accurate to carry out echo analysis in the 
linear domain. In the echo analysis block 72, the. signal 
levels of the two linearised signals are estimated. If 
the level ratio of near and far-end signals is lower than 
a threshold value, the near-end signal is considered as 
an echo and comfort noise is inserted to the signal that 
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is to be transmitted to the far-end as "send out" signal. 
If there is an acoustic echo, a special filtering can be 
used for far-end signal estimation to improve the double 
talk performance of the echo suppression, as described 
e.g. in document WO 9749196. In order to get the correct 
result from the signal comparison, the echo path delay 
has to be known. If the delay is variable, a delay 
estimation might be needed to define the correct delay 
value. A cross-correlation can be used for the delay 
estimation. 

In the noise estimation block 73, an accurate noise 
estimate of the linearised near-end signal received from 
the first decoder 70 is formed. Preferably, background 
noise is estimated in both, the level and the spectral 
domain. The estimation method can be the same as the 
method described for noise suppression. Equally, other 
methods can be used, e.g. methods based on filter banks 
or Fourier transformation. 

The comfort noise is then generated in the comfort noise 
generation block 74 by making use of the noise estimates 
received from the noise estimation block 73 . To generate 
the comfort noise, a level scaled white noise is fed 
through a synthesis filter which actually has the 
equivalent envelope spectrum as in the noise estimation 
block 73. Therefore the synthesis filter can be a LP 
filter or filter bank. 

Finally, the generated comfort noise is encoded by the 
encoder 75 to form a frame or a sub-frame including an 
encoded comfort noise parameter. 
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If an echo was manifested by the echo analysis block 72 
for the current send in frame or sub-frame, the switch 76 
is switched by the echo analysis block 72 to connect the 
encoder 75 with the output line and the current frame or 
sub- frame is replaced with generated encoded comfort 
noise parameter. If no echo is manifested, the switch 76 
keeps connecting or is switched by the echo analysis 
block 72 to connect the first input line with the output 
line so that the original frame or sub-frame is forwarded 
to the output line without being replaced. 

By using the described method, tandem speech coding can 
be avoided both in speech and comfort noise frames and 
high quality speech can be provided. 

Alternatively and in order to save processing and memory 
resources, the speech encoder can be omitted by 
generating comfort noise directly in the parameter 
domain. In the parameter domain comfort noise generation, 
a long-term LP spectrum envelope of background noise is 
averaged as described with reference to figure 6. 
Additionally, a long-term excitation gain parameter is 
averaged with the same updating principles as for the LP 
spectrum envelope updating, i.e. it is updated if the VAD 
flag is; false. Typically only the fixed codebook gain 
value needs to be averaged as the adaptive codebook gain 
value is close to zero if there is noise type of signal. 
As a comfort noise frame or sub- frame needs to be 
transmitted to the far- end, original LPC and excitation 
gain parameters are replaced with the averaged LPC and 
gain parameters. Moreover, the original excitation pulses 
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within the frame are replaced with random pulses which 
represent white noise in the parameter domain. If 
discontinuous transmission (DTX) is used in the send in 
direction , excitation pulses need not to be transmitted. 
Instead, only averaged LPC and gain parameters are 
transmitted in the silence description frame (SID) which 
is standardised for most of the speech codecs. In 
discontinuous transmission, random excitation pulses are 
generated at the decoder end. 

Figure 8 shows a block diagram of an echo cancellation 
device that can be integrated in first processing means 
of a network element according to the invention for echo 
cancellation in the parameter domain. 

A first input line is connected directly to a first 
decoder 80 and a second input line is connected via a 
FIFO (first in first out) frame memory 87 to a second 
decoder 81, both decoders 80, 81 being connected in turn 
to an adaptive filter 82. The adaptive filter 82 is 
connected to an NLP and comfort noise generation block 84 
and the first decoder is connected to a second input of 
the same block 84 via a noise estimation block 83. The 
output of the NLP and comfort noise generation block 84 
is connected via an encoder 85 to a switch 86. The switch 
86 can either form a connection between the encoder 85 
and an output line or between the first input line and 
the output line. An output of the first decoder 80, the 
second decoder 81 and the adaptive filter 82 are 
connected in addition to inputs of a control logic 88. 
The control logic 88 has controlling access to the 
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adaptive filter 82, the NLP and comfort noise generation 
block 84 and the switch 86. 

The proposed echo cancellation is quite a similar to the 
above described echo suppression. The adaptive filter 82 
and the control logic 88 are included to lower the echo 
signal before a residual echo suppression function is 
applied by a non-linear processor (NLP) 84. For the 
linear adaptive filtering, signals from both directions 
have to be linearised by the local decoders 80, 81. As 
there are two speech codings for the returning echo 
signal, cumulated non-linear distortions reduce 
remarkably the effectiveness of linear adaptive 
filtering. Therefore it might be desirable to include a 
non- linear echo modelling within echo cancellation, as 
described e.g. in document WO 9960720. Moreover, delays 
introduced into the echo path by speech codings, 
transmission or other signal processing can be 
compensated by the FIFO frame memory block 87. Thus the 
amount of taps of the adaptive filter 82 can be reduced 
and less processing capacity is required. 

The function of the noise estimation block 83 and the NLP 
and comfort noise generation block 84 can be similar to 
the above described noise suppression, although the 
control of the NLP 84 can be different as more 
parameters, e.g. echo path model, achieved echo 
attenuation, send in, receive in and residual echo 
signals, can be utilised in the NLP decision. This is 
handled within control logic block 88. The output of the 
NLP and comfort noise generation block 84 is encoded by 
the encoder 85. 
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The switch 86 is provided for switching between speech 
frames received at the send in port and the encoded 
output of NLP/comfort noise block, i.e. the output of the 
send out port is either a bypassed send in frame (or sub- 
frame) or an echo cancelled frame (or sub- frame) . A 
criterion of the selection could be as follows. 

If there is no speech activity or if the signal level of 
the far-end is low enough, send in frames are bypassed. 
Otherwise the output of the NLP/comfort noise block 84 is 
chosen as output after encoding by the encoder 85. 
Therefore, a TFO stream is left untouched if only the 
near-end talks or if there is silence in both directions. 
If the far-end talks only, encoded comfort noise is 
inserted. If there is a double talk condition, either 
comfort noise or output of the adaptive filter 82 is 
chosen for the send out signal. This depends on the state 
of NLP 84 and typically varies during the double talk. A 
benefit of this method is that there is a tandem free 
operation for the near-end signal most of the time. At 
the time instants when tandem coded frames are sent to 
the far-end direction, double talk with the NLP block 84 
is inactive. However, this is not subjectively more 
annoying compared to that of conventional echo 
cancellation as the NLP switching already introduces some 
artefacts on near- end speech and because direct acoustic 
masking and side-tone of the far-end diminish the 
audibility of NLP artefacts during double talk. 

Alternatively, in order to save processing and memory 
resources, the encoder could be omitted by generating 



WO 02/054744 PCT/EP00/13320 

- 41 - 

comfort noise directly in the parameter domain as 
described with reference to figure 7. 
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Claims 



1. Network element (1) for enhancing the quality of 
digitised analogue signals transmitted at least in 
parameterised coded form via a digital network to 
which the network element (1) has access, comprising 
a payload extraction block (20 , 3 0 , 40) for 
extracting coded digitised analogue signals from 
the digital network,* which coded digitised 
analogue signals include at least in part 
parameterised coded digitised analogue signals; 
first processing means (24,34,44) for processing 
the extracted parameterised coded digitised 
analogue signals in the parameter domain with 
functions suitable to enhance the quality of the 
digitised analogue signals; 

second processing means (26,36,46) for processing 
at least part of the extracted coded digitised 
analogue signals in the linear domain with 
functions suitable to enhance the quality of the 
digitised analogue signals; 

a payload insertion block (21,31,41) for inserting 
processed coded digitised analogue signals to the 
digital network; and 

analysing and selecting means (23,27,33,37) for 
determining the quality improvement of the 
digitised analogue signals resulting from a 
processing of the extracted coded digitised 
analogue signals in the parameter domain and from 
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a processing of the extracted coded digitised 
analogue signals in the linear domain and for 
causing that at least coded digitised analogue - 
signals processed by the processing means 
(24,26,34,36) leading to the better improvement 
are inserted back to the digital network by the 
payload insertion block (21,31,41). 

2. Network element (1) according to claim 1, wherein the 
functions for processing the parameterised coded 
digitised analogue signals by the first processing 
means (24,34,44) include echo cancellation, noise 
reduction, and/or level control. 

3. Network element (1) according to one of the preceding 
claims, wherein the functions for processing coded 
digitised analogue signals by the second processing 
means (26,36,46) include echo cancellation, noise 
reduction, level control, transcoding and/or speech 
mixing . 

4. Network element (1) according to one of the preceding 
claims, wherein the analysing and selecting means are 
suited for analysing the digitised analogue signals 
before and after processing by the first and the 
second processing means for determining the better 
quality improvement. 

5. Network element (1) according to one of the preceding 
claims, wherein the analysing and selecting means 
comprises a neural network for determining the better 
quality improvement. 
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6. Network element (1) according to one of the preceding 
claims, characterised by a bad frame handler 
(22 , 32 ,42) for detecting in the extracted coded 
digitised analogue signals missing frames and/or 
frames in disorder, and for regenerating missing 
frames and/or reordering frames in disorder in the 
extracted signals. 



7. Network element (1) according to one of the preceding 
claims, characterised by analysing means (23,33) for 
determining whether any processing is to be applied 
to the extracted parameterised coded digitised 
analogue signals and for selecting the functions that 
are to be applied to the extracted coded digitised 
analogue signals by the first processing means 
(24,34) and/or the second processing means (26,36) 
depending on the extracted signals and/or an external 
control signal. 

8. Network element (1) according to one of the preceding 
claims, characterised by a control block (29,39,49) 
for receiving an external control signal and for 
controlling the selection of the processing applied 
to the extracted coded signals directly or via 
analysing means (23,33,37) . 

9. Network element (1) according to one of the preceding 
claims, wherein the payload extraction block (20) is 
suitable to extract parameterised coded digitised 
analogue signals from the IP stack of a packet-based 
network and wherein the payload insertion block (21) 
is suitable to insert parameterised coded digitised 
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analogue signals to the IP stack of said packet-based 
network. 

10. Network element (1) according to claim 9, wherein the 
network element (1) further comprises: 
- decoding means (25) for decoding said 

parameterised coded digitised analogue signals and 
for forwarding the decoded signals to the second 
processing means (26) ; 

encoding means (28) for encoding the signals 
processed and output by the second processing 
means (26) ; 

first selecting means (23) for receiving the 
extracted coded digitised analogue signals from 
the payload extraction block (20) and for 
forwarding the extracted coded digitised analogue 
signals either to the first processing means (24) 
or to the second processing means (26) via the 
decoding means (25) ; 

second selecting means (27) for receiving as 
inputs the outputs of the first processing means 
(24) and of the second processing means (26) and 
for selecting which output is to be forwarded to 
the insertion payload block (21) ; and 
analysing means (23) for determining whether the 
extracted coded digitised analogue signals are to 
be processed by the first processing means (24) or 
the second processing means (26) and for 
controlling the first selecting means (23) and the 
second selecting means (27) accordingly. 



11. Network element according to one of claim 1 to 8, 
wherein the payload extraction block (30,4 0) is 
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suitable for extracting a TFO stream and a PCM stream 
from a TDM based network, for separating said TFO 
stream from said PCM stream and for providing the TFO 
stream as parameterised coded digitised analogue 
signals, and wherein the payload insertion block 
(31,41) is suitable for combining a TFO stream with a 
PCM stream and for inserting the combined stream, or 
only a PCM stream if no TFO stream is provided to the 
payload insertion block (31,41), to the TDM based 
network. 

12. Network element according to claim 11, wherein the 
network element further comprises: 

decoding means (35) for decoding said TFO stream 
and for forwarding the decoded signals to the 
second processing means (36) ; 

encoding means (38) for encoding the signals 
processed and output by the second processing 
means (36) ; 

means (33) for receiving the TFO stream from the 
payload extraction block (30) and for forwarding 
the TFO stream to the first processing means (34) 
and via the decoding means (35) to the second 
processing means (36) ; 

selecting means (37) for receiving as inputs the 
outputs of the first processing means (34) and of 
the encoding means (38) and for forwarding the 
output with the better quality improvement of the 
digitised analogue signal to the insertion payload 
block (31) ; 

analysing means (33) for determining whether a 
processing in the first processing means (34) or 
in the second processing means (36) results in a 
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better quality improvement of the digitised 
analogue signals and for controlling the selecting 
means (37) accordingly; and 
- means (31) for forming a PCM stream out of the 

output of the second processing means (36) and for 
forwarding said PCM stream to the payload 
insertion block (31) . 

13. Network element according to claim 12, wherein the 
payload extraction block (30,40) is suitable for 
providing in addition the PCM stream as non- 
parameterised coded digitised analogue signals, 
further comprising means (33) for forwarding the 
extracted PCM stream to the second processing means 

(36) and for forwarding the output of the second 
processing means (36) to the payload insertion block 

(31) , if no TFO stream is available for processing. 

14. Network element according to one of claims 11, 
wherein the payload extraction block (30,4 0) is 
suitable for providing in addition the PCM stream as 
non-parameterised coded digitised analogue signals, 
wherein the TFO stream is processed in the first 
processing means (44) in the parameter domain and the 
PCM stream is processed in the second processing 
means (46) in the linear domain, and wherein at least 
the processed PCM stream is forwarded to the payload 
insertion block (41) . 

15. Network element (1) according to one of the preceding' 
claims, wherein the first processing means (24,34,44) 
is suited for comparing gain parameters of the 
extracted parameterised coded digitised analogue 
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signals with a desired gain, for forming 
corresponding new gain parameters and for replacing 
the original gain parameters by the new gain 
parameters in the extracted parameterised coded 
digitised analogue signals. 

16, Network element (1) according to claim 15, wherein 
the first processing means (24,34,44) comprises: 

- decoding means (50) for linearising extracted 
parameterised coded digitised analogue signals and 
for providing decoded gain parameters of the 
digitised analogue signals; 

- means (51) for estimating the level of the 
linearised coded digitised analogue signal and for 
determining desired gain values based on the 
estimated level of the digitised analogue signal 
and a desired target level of the digitised 
analogue signal; 

means (52) for determining out of decoded gain 
parameters of the coded digitised analogue signal 
and the desired gain values new gain parameters 
suitable for achieving the desired gain by linear- 
tq-parameter domain mapping; and 
means (53) for re-quantising the new gain 
parameters and for replacing the original gain 
parameters with the new gain parameters in the 
parameterised coded digitised analogue signal. 

17. Network element (1) according to claim 16, wherein 
the means (51) for estimating the level of the 
digitised analogue signal comprises a voice activity 
detector for ensuring that only speech signals are 
estimated in the signal level estimate. 





WO 02/054744 



PCT/EP00/13320 



49 



18. Network element (1) according to one of the preceding 
claims, wherein the first processing means (24,34,44) 
is suited for attenuating noise portions and low 
level signal portions of the extracted parameterised 
coded digitised analogue signals in the time domain 
and for correspondingly replacing the gain parameters 
in the extracted parameterised coded digitised 
analogue signals. 

19. Network element (1) according to one of the preceding 
claims, wherein the first processing means (24,34,44) 
is suited for attenuating frequency portions of noise 
in the extracted parameterised coded digitised 
analogue signals which have approximately the same 
energy as a noise estimate and for correspondingly 
replacing linear prediction parameters in the 
extracted parameterised coded digitised analogue 
signals . 

20. Network element (1) according to claim 19, wherein 
the first processing means (24,34,44) comprises: 



a decoder (60) for decoding linear prediction 
coefficients from extracted coded digitised 
analogue signals; 



means (64) for estimating a long term power 
spectrum of the noise of the digitised analogue 
signals; 

- means (65) for estimating a short-term power 
spectrum of the noise of the digitised analogue 
signals; 
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means (67) for determining a desired spectrum 
depending on the difference between the long term 
spectrum and the short term spectrum ; 
means (68) for determining new linear prediction 
coefficients according to the desired spectrum; 
means (69) for quantising the new linear 
prediction coefficients parameters or for 
converting them to line spectral pairs parameters 
and for replacing them for the old parameters in 
the extracted parameterised coded digitised 
analogue signal . 

21. Network element (1) according to one of the preceding 
claims, wherein the payload extraction block 
(20,30,40) is suited for extracting in addition 
further coded digitised analogue signals from the 
digital network transmitted in the opposite direction 
as compared to the extracted coded digitised analogue 
signals, which further coded digitised analogue 
signals include at least in part parameterised coded 
digitised analogue signals; and wherein the first 
processing means (24,34,44) is suited for comparing 
the first and the further parameterised coded 
digitised analogue signals in order to detect echoes 
in the first parameterised coded digitised analogue 
signals and for replacing portions of the first 
parameterised coded digitised analogue signal by- 
comfort noise portions, if an echo was determined in 
the portion of the first parameterised coded 
digitised analogue signal. 

22. Network element (1) according to claim 21, wherein 
the first processing means (24,34,44) comprises: 
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- a first decoder (70) for linearising extracted 
coded digitised analogue signals from a first 
direction; 

- a second decoder (71) for linearising the further 
extracted coded digitised analogue signals from an 
opposite direction; * 

- an echo analysis block (72) for detecting an echo 
in a portion of the first extracted signal and 

- means (72-76) for generating comfort noise and for 
replacing the original portion of the first 
extracted signal with a corresponding comfort 
noise parameter in case an echo was detected. 

23. Network element (1) according to one of the preceding 
claims, wherein the payload extraction block 
(20,30,40) is suited for extracting in addition 
further coded digitised analogue signals from the 
digital network transmitted in the opposite direction 
as compared to the extracted coded digitised analogue 
signals, which further coded digitised analogue 
signals include at least in part parameterised coded 
digitised analogue signals; and wherein the first 
processing means (24,34,44) is suited for attenuating 
an echo signal in the first parameterised coded 
digitised analogue signals making use of the further 
parameterised coded digitised analogue signals and 
for suppressing the residual echo signal. 

24. Network element (1) according to claim 23, wherein 
the first processing means (24,34,44) comprises: 

- a first decoder (80) for linearising extracted 
coded digitised analogue signals from a first 
direction; 
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- a second decoder (81) for linearising the further 
extracted coded digitised analogue signals from an 
opposite direction; 

- an adaptive filter (82) and a control logic (88) 
receiving linearised signals from the first and 
the second decoder (80,81) for attenuating echo 
signals in the linearised signal received from the 
first decoder (80); 

- a non linear processor (84) for residual echo 
suppression based on residual echo signals 
received from the adaptive filter (82) and on a 
noise estimation of the linearised signals from 
the first direction; and 

means (83-86,88) for generating comfort noise 
based on the- residual echo suppression and for 
replacing the original portion of the first 
extracted signal with a corresponding comfort 
noise parameter in case an echo was detected. 

25. Network element (1) according to one of claims 21 to 
24, wherein the first processing means (24,34,44) is 
suited for by-passing the first parameterised coded 
digitised analogue signals without processing, if 
there is no signal activity in the opposite direction 
or if the signal level of the extracted parameterised 
coded digitised analogue signals is below a threshold 
level in the opposite direction. 

26. Network element (1) according to one of the preceding 
claims, wherein the extracted coded digitised 
analogue signals are coded speech or coded video. 
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27, Method for enhancing the quality of digitised 
analogue signals transmitted at least in 
parameterised coded form via a digital network, 
comprising: 

extracting coded digitised analogue signals from 
the digital network, which coded digitised 
analogue signals include at least in part 
parameterised coded digitised analogue signals; 
determining the quality improvement of the 
digitised analogue signals to be expected by a 
processing of the extracted encoded digitised 
analogue signals in the parameter domain and by a 
processing of the extracted encoded digitised 
analogue signals in the linear domain; 
processing the extracted parameterised coded 
digitised analogue signals in the parameter domain 
at least if a greater quality improvement is 
expected by processing in the parameter domain, 
with functions suitable for enhancing the quality 
of digitised analogue signals; and 
processing at least part of the extracted coded 
digitised analogue signals in the linear domain at 
least if a greater quality improvement is expected 
by processing in the linear domain, with functions 
suitable for enhancing the quality of digitised 
analogue signals; and 

inserting at least those processed coded digitised 
analogue signals to the digital network that were 
processed in the domain, the processing in which 
was expected to result in a greater quality 
improvement , 
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28. Method according to claim 27, further comprising: 

decoding the extracted parameterised coded 
digitised analogue signals for processing in the 
linear domain; and 
- encoding the processed decoded signals after 
processing in the linear domain to form 
parameterised coded digitised analogue signals 
again. 

29. Method according to claim 28, further comprising: 

transforming the processed decoded signals to form 
non-parameterised coded digitised analogue 
signals; and 

inserting the non-parameterised coded digitised 
analogue signals to the digital network. 

30. Method according to claim 27, wherein part of the 
extracted encoded digitised analogue signals are non- 
parameterised coded digitised analogue signals 
corresponding to the extracted parameterised coded 
digitised analogue signals, wherein the parameterised 
coded digitised analogue signals are processed in the 
parameter domain if a greater quality improvement is 
expected by processing in the parameter domain, 
wherein the non-parameterised coded digitised 
analogue signals are processed in the linear domain, 
wherein the processed non-parameterised coded 
digitised analogue signals are inserted to the 
digital network again and wherein the processed 
parameterised coded digitised analogue signals are 
inserted to the digital network again if a greater 
quality improvement is expected by processing in the 
parameter domain. 
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31. Method according to one of claims 27 to 30, wherein 
the quality improvement of a processing in the linear 
and in the parameter domain is determined by 



after processing in the linear and in the parameter- 
domain . 

32. Method according to one of claims 27 to 31 , wherein 
the quality improvement of a processing in the linear 
and in the parameter domain is determined by using a 
neural network. 

33. Method according to one of claims 27 to 32, further 
comprising: 

selecting those processing functions that are 
suitable for an enhancement of the quality of the 
digitised analogue signals in question and 
carrying out only those processing functions. 

34. Method according to one of claims 27 to 33, wherein 
the processing in the parameter domain comprises for 
a gain control comparing gain parameters of the 
extracted parameterised coded digitised analogue 
signals with a desired gain for forming corresponding 
new gain parameters and for replacing the original 
gain parameters by the new gain parameters in the 
extracted parameterised coded digitised analogue 
signals. 



analysing the digitised analogue signal before and 



35. Method according to claim 34, comprising 
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linearising extracted parameterised coded 
digitised analogue signals and providing decoded 
gain parameters of the digitised analogue signals; 
estimating the level of the linearised coded 
digitised analogue signals and determining desired 
gain values based on the estimated signal level 
and a desired target signal level; 

- determining out of decoded gain parameters of the 
coded digitised analogue signals and the desired 
gain values new gain parameters suitable for 
achieving the desired gain by linear- to-parameter 
domain mapping; and 

- re-quantising the new gain parameters and 
replacing the original gain parameters with the 
new gain parameters in the coded digitised 
analogue signals signal . 

36. Method according to one of claims 27 to 35 , wherein 
the processing in the parameter domain comprises for 
noise suppression attenuating noise portions and low 
level signal portions of the extracted parameterised 
coded digitised analogue signals in the time domain 
and for correspondingly replacing gain parameters in 
the extracted parameterised coded digitised analogue 
signals . 

37. Method according to one of claims 27 to 36, wherein 
the processing in the parameter domain comprises for 
noise suppression attenuating frequency portions of 
noise in the extracted parameterised coded digitised 
analogue signals which have which have approximately 
the same energy as a noise estimate and for 
correspondingly replacing linear prediction 
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parameters in the extracted parameterised coded 
digitised analogue signals. 

38. Method according to claim 37, comprising 

decoding linear prediction coefficients from 
extracted coded digitised analogue signals; 
- estimating a long term power spectrum of the noise 
of the digitised analogue signals; 
estimating a short-term power spectrum of the 
noise of the digitised analogue signals; 
determining a desired spectrum in dependence of 
the difference between the long term spectrum and 
the short term spectrum; 

determining new linear prediction coefficients 
according to the desired spectrum; and 
quantising the new linear prediction coefficients 
parameters or converting them to line spectral 
pairs parameters and replacing them for the old 
parameters in the parameterised coded digitised 
analogue signal . 

39. Method according to one of claims 27 to 38, wherein 
the processing in the parameter domain comprises for 
echo suppression extracting in addition further 
parameterised coded digitised analogue signals 
transmitted in the opposite direction, comparing the 
first and the further parameterised coded digitised 
analogue signals in order to detect echoes in the 
first parameterised coded digitised analogue signals 
and replacing portions of the first parameterised 
coded digitised analogue signal by generated portions 
with comfort noise parameters, if an echo was 
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determined in the portion of the first parameterised 
coded digitised analogue signal . 

40. Method according to claim 39 , comprising linearising 
the extracted coded digitised analogue signals from 
the first and the opposite direction before comparing 
them. 

41. Method according to one of claims 27 to 40, 
comprising extracting in addition further coded 
digitised analogue signals from the digital network 
transmitted in the opposite direction as compared to 
the extracted coded digitised analogue signals, which 
further coded digitised analogue signals include at 
least in part parameterised coded digitised analogue 
signals, attenuating an echo signal in the first 
parameterised coded digitised analogue signals making 
use of the further parameterised coded digitised 
analogue signals and suppressing the residual echo 
signal . 

42. Method according to claim 41, comprising linearising 
the extracted coded digitised analogue signals from 
both directions before attenuating the echo signal, 
generating comfort noise based on the result of the 
suppression and an estimated noise in the first 
digitised analogue signal and replacing a portion of 
the original first extracted signal in which an echo 
was detected with a portion comprising a 
corresponding comfort noise parameter. 

43. Method according to one of claims 39 to 42, wherein 
the processing in the parameter domain comprises by- 
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passing the first parameterised coded digitised 
analogue signals without echo detection, if there is 
no signal activity in the opposite direction or if 
the signal level of the extracted parameterised coded 
digitised analogue signals is below a threshold level 
in the opposite direction. 
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